Systems and methods for object detection

ABSTRACT

A method performed by an electronic device is described. The method includes receiving a set of images. The method also includes determining a motion region and a static region based on the set of images. The method further includes extracting, at a first rate, first features from the motion region. The method additionally includes extracting, at a second rate that is different from the first rate, second features from the static region. The method also includes caching the second features. The method further includes detecting at least one object based on at least a portion of the first features.

FIELD OF DISCLOSURE

The present disclosure relates generally to electronic devices. Morespecifically, the present disclosure relates to systems and methods forobject detection.

BACKGROUND

Some electronic devices (e.g., cameras, video camcorders, digitalcameras, cellular phones, smart phones, computers, televisions,automobiles, personal cameras, action cameras, surveillance cameras,security cameras, mounted cameras, connected cameras, Internet Protocol(IP) cameras, robots, drones, smart applications, healthcare equipment,set-top boxes, etc.) capture and/or utilize images. For example, a smartphone may capture and/or process still and/or video images. Processingimages may demand an amount of time, memory, and energy resources. Theresources demanded may vary in accordance with the complexity of theprocessing.

Interpreting image data may be particularly complex in some cases. Forexample, interpreting large amounts of image data may demand a largeamount of processing resources. As can be observed from this discussion,systems and methods that improve image processing may be beneficial.

SUMMARY

A method performed by an electronic device is described. The methodincludes receiving a set of images. The method also includes determininga motion region and a static region based on the set of images. Themethod further includes extracting, at a first rate, first features fromthe motion region. The method additionally includes extracting, at asecond rate that is different from the first rate, second features fromthe static region. The method also includes caching the second features.The method further includes detecting at least one object based on atleast a portion of the first features. Extracting the first features andextracting the second features may be performed on a union of the motionregion and one or more regions of interest (ROIs) in the static region.The second rate may be lower than the first rate. The first features andthe second features may be cached in a shared feature map.

The method may include detecting movement in the static region. Themethod may also include retrieving information from a cache in responseto the detected movement. The information may include cached features.The method may also include determining a region of interest (ROI) basedon the cached features and identifying an object based on the ROI andthe cached features. The information may include a label. The method mayinclude presenting the label.

A first operation thread that operates at the first rate may includeextracting the first features and detecting the at least one objectbased on the at least a portion of the first features. A secondoperation thread that operates at the second rate may include extractingthe second features and caching the second features. The secondoperation thread may further include determining at least one region ofinterest (ROI) in the static region and detecting at least one objectbased on at least a portion of the second features in the at least oneROI.

The at least one ROI may include a set of ROIs in the static region.Extracting the second features may include extracting features from atmost a subset of the set of ROIs for each image of the set of images.

The method may include classifying the at least one object to produce atleast one label. The method may also include presenting the at least onelabel.

An electronic device is also described. The electronic device includes aprocessor. The processor is configured to receive a set of images. Theprocessor is also configured to determine a motion region and a staticregion based on the set of images. The processor is further configuredto extract, at a first rate, first features from the motion region. Theprocessor is additionally configured to extract, at a second rate thatis different from the first rate, second features from the staticregion. The processor is also configured to cache the second features.The processor is further configured to detect at least one object basedon at least a portion of the first features.

An apparatus is also described. The apparatus includes means forreceiving a set of images. The apparatus also includes means fordetermining a motion region and a static region based on the set ofimages. The apparatus further includes means for extracting, at a firstrate, first features from the motion region. The apparatus additionallyincludes means for extracting, at a second rate that is different fromthe first rate, second features from the static region. The apparatusalso includes means for caching the second features. The apparatusfurther includes means for detecting at least one object based on atleast a portion of the first features.

A non-transitory tangible computer-readable medium storing computerexecutable code is also described. The computer-readable medium includescode for causing an electronic device to receive a set of images. Thecomputer-readable medium also includes code for causing the electronicdevice to determine a motion region and a static region based on the setof images. The computer-readable medium further includes code forcausing the electronic device to extract, at a first rate, firstfeatures from the motion region. The computer-readable mediumadditionally includes code for causing the electronic device to extract,at a second rate that is different from the first rate, second featuresfrom the static region. The computer-readable medium also includes codefor causing the electronic device to cache the second features. Thecomputer-readable medium further includes code for causing theelectronic device to detect at least one object based on at least aportion of the first features.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating one example of an electronicdevice in which systems and methods for object detection may beimplemented;

FIG. 2 is a flow diagram illustrating one configuration of a method forobject detection;

FIG. 3 is a block diagram illustrating an example of object detection inaccordance with some configurations of the systems and methods disclosedherein;

FIG. 4 is a diagram illustrating an example of object detection inaccordance with some configurations of the systems and methods disclosedherein;

FIG. 5 is a diagram illustrating another example of object detection inaccordance with some configurations of the systems and methods disclosedherein;

FIG. 6 is a block diagram illustrating an example of elements that maybe implemented in accordance with some configurations of the systems andmethods disclosed herein;

FIG. 7 is a flow diagram illustrating a more specific configuration of amethod for object detection;

FIG. 8 is a flow diagram illustrating a configuration of a method 800for presenting detection results; and

FIG. 9 illustrates certain components that may be included within anelectronic device.

DETAILED DESCRIPTION

The systems and methods disclosed herein may relate to object detection.For example, some configurations of the systems and methods disclosedherein may relate to a differential rate (e.g.,synchronous/asynchronous) region-based convolutional neural network forobject detection.

Some object detection techniques may utilize a region-basedconvolutional neural network (RCNN) framework to achieve high objectdetection accuracy. Techniques based on this framework may achievesuperior performance because the framework divides the object detectiontask into a two-stage flow. In a first stage of the network, forexample, the feature(s) of the input frame are extracted by aconvolutional neural network, and the feature(s) may be sent to a regionproposal generator to generate several regions of interest (ROIs), whichmay be referred to as region proposals. Each ROI and feature map may besent to the second stage of the network to identify the bounding box andclass label of the ROI. These techniques are advantageous in terms ofaccuracy, but suffer from a high computational requirement.

For example, for a frame of size H×W with N ROIs, the inference time ofRCNN object detection is composed of the time for performingconvolutional neural network (CNN) feature extraction, region proposalgeneration, and N feature alignment and detection operations. Hence, thecomputation complexity can be represented as O(HW)+O(1)+O(N). The use ofan RCNN may result in speed and accuracy trade-offs. For example, ahigher resolution (larger width (W) and height (H)) input image mayprovide a better detection accuracy. One of the reasons for this is thata small (or distant) object in the high-resolution image possesses morepixels for helping the object detection task as compared to alow-resolution image. However, the computation requirement grows as theinput resolution increases. On the other hand, the number of ROIs playsa significant role in the performance of object detection. The increaseof the number of ROIs allows a small and low-confidence object to bebetter scrutinized in the second part of the network. However, thisintroduces a linear complexity as the number of ROIs increase. Forinstance, in the use case of deploying an RCNN object detector forreal-time surveillance in a 4K IP camera (H=2160, W=3840, N=300), thecomputation load may be too large for real-time application on someplatforms.

Some configurations of the systems and methods disclosed herein may beimplemented for object detection in a fixed field of view (FOV) with astatic (e.g., stationary) camera. A static camera may be utilized in avariety of applications, such as video surveillance of a parking lot,street surveillance, and/or home security, etc. Users may beparticularly interested in the localization and/or recognition of amoving object.

Some configurations may utilize the fact that most of the regions in theFOV of a static camera may remain static. Accordingly, someconfigurations may avoid the high computation load of computing thefeature of an entire raw frame (e.g., a 3840×2160 video frame) bycomputing the feature of the motion regions, which may be a typicallyand relatively small portion of the raw video frame. Some configurationsmay cache the computed feature and/or detection results of the currentframe and/or earlier frames. This may enable utilizing previouscomputation to extrapolate for future prediction.

Some configurations of the systems and methods disclosed herein mayutilize a framework based on a RCNN for object detection. The frameworkmay utilize multiple (e.g., synchronous and asynchronous) threads. Forexample, the threads may include operations performed at differentrates. In some approaches, the feature extraction task of an input framemay be divided into a first (e.g., synchronous) thread and a second(e.g., asynchronous) thread by a feature manager. The feature managermay utilize the output of a motion detector to set the priority ofregions for feature computation. For instance, the feature(s) of themotion region(s) may be updated more frequently. It should be noted thatthe motion detector may be an object tracker, blob tracker, foregrounddetector, etc., which may output a motion map to indicate a motionregion (e.g., part) and/or static region (e.g., part). For example, themotion map may be a binary mask or soft-value mask so that the featuremanager can assign the priority based on the motion intensity.

Various configurations are now described with reference to the Figures,where like reference numbers may indicate functionally similar elements.The systems and methods as generally described and illustrated in theFigures herein could be arranged and designed in a wide variety ofdifferent configurations. Thus, the following more detailed descriptionof several configurations, as represented in the Figures, is notintended to limit scope, as claimed, but is merely representative of thesystems and methods.

FIG. 1 is a block diagram illustrating one example of an electronicdevice 102 in which systems and methods for object detection may beimplemented. Examples of the electronic device 102 may include cameras,video camcorders, digital cameras, cellular phones, smart phones,computers (e.g., desktop computers, laptop computers, etc.), tabletdevices, media players, televisions, vehicles, automobiles, personalcameras, action cameras, surveillance cameras, security cameras, mountedcameras, connected cameras, IP cameras, robots, aircraft, drones,unmanned aerial vehicles (UAVs), healthcare equipment, gaming consoles,personal digital assistants (PDAs), set-top boxes, etc. The electronicdevice 102 may include one or more components or elements. One or moreof the components or elements may be implemented in hardware (e.g.,circuitry), in a combination of hardware and software (e.g., a processorwith instructions) and/or in a combination of hardware and firmware.

In some configurations, the electronic device 102 may include aprocessor 112, a memory 126, a display 132, one or more image sensors104, one or more optical systems 106, and/or a communication interface108. The processor 112 may be coupled to (e.g., in electroniccommunication with) the memory 126, display 132, image sensor(s) 104,optical system(s) 106, and/or communication interface 108. It should benoted that one or more of the elements illustrated in FIG. 1 may beoptional. In particular, the electronic device 102 may not include oneor more of the elements illustrated in FIG. 1 in some configurations.For example, the electronic device 102 may or may not include an imagesensor 104 and/or optical system(s) 106. Additionally or alternatively,the electronic device 102 may or may not include a display 132.Additionally or alternatively, the electronic device 102 may or may notinclude a communication interface 108.

In some configurations, the electronic device 102 may present a userinterface 134 on the display 132. For example, the user interface 134may enable a user to interact with the electronic device 102. In someconfigurations, the display 132 may be a touchscreen that receives inputfrom physical touch (by a finger, stylus, or other tool, for example).Additionally or alternatively, the electronic device 102 may include orbe coupled to another input interface. For example, the electronicdevice 102 may include a camera facing a user and may detect usergestures (e.g., hand gestures, arm gestures, eye tracking, eyelid blink,etc.). In another example, the electronic device 102 may be coupled to amouse and may detect a mouse click. In some configurations, one or moreof the images described herein (e.g., set of image frames, video, etc.)may be presented on the display 132 and/or user interface 134.

The communication interface 108 may enable the electronic device 102 tocommunicate with one or more other electronic devices. For example, thecommunication interface 108 may provide an interface for wired and/orwireless communications. In some configurations, the communicationinterface 108 may be coupled to one or more antennas 110 fortransmitting and/or receiving radio frequency (RF) signals. Additionallyor alternatively, the communication interface 108 may enable one or morekinds of wireline (e.g., Universal Serial Bus (USB), Ethernet, etc.)communication.

In some configurations, multiple communication interfaces 108 may beimplemented and/or utilized. For example, one communication interface108 may be a cellular (e.g., 3G, Long Term Evolution (LTE), CDMA, etc.)communication interface 108, another communication interface 108 may bean Ethernet interface, another communication interface 108 may be auniversal serial bus (USB) interface, and yet another communicationinterface 108 may be a wireless local area network (WLAN) interface(e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11interface).

The electronic device 102 (e.g., image obtainer 114) may obtain one ormore images (e.g., digital images, image frames, frames, video, etc.).The one or more images (e.g., image frames) may be images of a scene(e.g., one or more objects and/or background). For example, theelectronic device 102 may include one or more image sensors 104 and oneor more optical systems 106 (e.g., lenses). An optical system 106 mayfocus images of objects that are located within the field of view of theoptical system 106 onto an image sensor 104. The optical system(s) 106may be coupled to and/or controlled by the processor 112 in someconfigurations.

A camera may include at least one image sensor and at least one opticalsystem. Accordingly, the electronic device 102 may be one or morecameras and/or may include one or more cameras in some implementations.In some configurations, the image sensor(s) 104 may capture the one ormore images (e.g., image frames, video, still images, burst mode images,etc.). In some implementations, the electronic device 102 may includemultiple optical system(s) 106 and/or multiple image sensors 104. Forexample, the electronic device 102 may include multiple wide-anglelenses (e.g., fisheye lenses), multiple “normal” lenses, multipletelephoto lenses, and/or a combination of different kinds of lenses insome configurations. Different lenses may each be paired with separateimage sensors 104 in some configurations. Additionally or alternatively,two or more lenses may share the same image sensor 104.

Additionally or alternatively, the electronic device 102 may requestand/or receive the one or more images from another device (e.g., one ormore external image sensors coupled to the electronic device 102, anetwork server, traffic camera, drop camera, automobile camera, webcamera, smart phone camera, etc.). In some configurations, theelectronic device 102 may request and/or receive the one or more images(e.g., image frames) via the communication interface 108. For example,the electronic device 102 may or may not include a camera (e.g., animage sensor 104 and/or optical system 106) and may receive images fromone or more remote devices.

The memory 126 may store instructions and/or data. The processor 112 mayaccess (e.g., read from and/or write to) the memory 126. Examples ofinstructions and/or data that may be stored by the memory 126 mayinclude image data 128 (e.g., one or more sets of image frames, video,etc.), features, feature points, feature vectors, feature map 136 data,detection results 138 data, keypoint data, corner data, image obtainer114 instructions, motion detector 116 instructions, feature manager 118instructions, feature extractor(s) 120 instructions, ROI determiner 122instructions, feature aligner(s) 124 instructions, object detector(s)130 instructions, and/or instructions for other elements, etc.

In some configurations, the electronic device 102 (e.g., the memory 126)may include an image data buffer (not shown). The image data buffer maybuffer (e.g., store) image data 128 (e.g., image frame(s)) from theimage sensor 104. The buffered image data may be provided to theprocessor 112.

In some configurations, the electronic device 102 may include a camerasoftware application and/or a display 132. When the camera applicationis running, images of scenes and/or objects that are located within thefield of view of the optical system(s) 106 may be captured by the imagesensor(s) 104. The images that are being captured by the image sensor(s)104 may be presented on the display 132. In some configurations, theseimages may be displayed in rapid succession at a relatively high framerate so that, at any given moment in time, the objects that are locatedwithin the field of view of the optical system 106 are presented on thedisplay 132. The one or more images obtained by the electronic device102 may be one or more video frames, one or more still images, and/orone or more burst frames, etc. It should be noted that someconfigurations of the systems and methods disclosed herein may utilize aseries of image frames (e.g., video).

The processor 112 may include and/or implement an image obtainer 114, amotion detector 116, a feature manager 118, one or more featureextractors 120, a ROI determiner 122, one or more feature aligners 124,and/or one or more object detectors 130. It should be noted that one ormore of the elements illustrated in the electronic device 102 and/orprocessor 112 may not be implemented in some configurations.

In some configurations, one or more of the elements illustrated in theprocessor 112 may be implemented separately from the processor 112(e.g., in other circuitry, on another processor, on a separateelectronic device, etc.). For example, the image obtainer 114, themotion detector 116, the feature manager 118, the feature extractor(s)120, the ROI determiner 122, the feature aligner(s) 124, and/or theobject detector(s) 130 may be implemented on a separate processor, onmultiple processors, and/or a combination of processors.

The processor 112 may include and/or implement an image obtainer 114.One or more images (e.g., image frames, video, burst shots, etc.) may beprovided to the image obtainer 114. For example, the image obtainer 114may obtain (e.g., receive) image frames from one or more image sensors104. For instance, the image obtainer 114 may receive image data fromone or more image sensors 104 and/or from one or more external cameras.As described above, the image(s) may be captured from the imagesensor(s) 104 included in the electronic device 102 and/or may becaptured from one or more remote camera(s). In some configurations, theimage obtainer 114 may request and/or receive the set of images. Forexample, the image obtainer 114 may request and/or receive one or moreimages from a remote device (e.g., external camera(s), remote server,remote electronic device, etc.) via the communication interface 108.

In some configurations, the image obtainer 114 may obtain a set of imageframes at a frame rate (e.g., frame capture rate). For example, theelectronic device 102 may capture the set of image frames at a framerate or the electronic device 102 may receive a set of image frames thathas been captured by another device at a frame rate. The set of images(e.g., video) may include (e.g., depict) one or more objects.

The processor 112 may include and/or implement a motion detector 116 insome configurations. The motion detector 116 may detect motion in one ormore images (e.g., frames, video, etc.). For example, the motiondetector 116 may detect a motion region. The motion region may includeone or more areas of an image where motion is detected. In someapproaches, the motion detector 116 may compare images (e.g., frames) todetermine the motion region in which the image has changed (to a degree,for example) relative to another image (e.g., previous image, subsequentimage, etc.). Examples of the motion detector 116 include an objecttracker, blob tracker, foreground detector, etc. The motion detector 116may produce a motion map. The motion map may indicate the motion regionand/or the static region of an image. As described above, the motion mapmay be a binary mask (where locations with one value indicate motion andlocations with another value indicate no motion, for example) or asoft-value mask.

The processor 112 may include and/or implement a feature manager 118.The feature manager 118 may determine and/or maintain a feature map 136.The feature map 136 may indicate one or more features (e.g., one or morefeature locations relative to one or more images). For example, thefeature map 136 may be stored in a pool of memory 126 that indicates oneor more feature locations for one or more images. The feature map 136may be cached (e.g., stored in memory 126).

The feature manager 118 may manage (e.g., control) feature extraction.For example, the feature manager 118 may control when feature extractionand/or updating are performed for one or more regions (e.g., motionregion, static region, and/or one or more ROIs, etc.). In someapproaches, the feature manager 118 may control the priority and/or rateof feature extraction for one or more regions. For example, the motionregion (e.g., one or more image areas with detected motion) may be givenfeature extraction and/or update priority over the static region (e.g.,one or more ROIs in the static region). For instance, feature extractionmay be performed on the motion region (e.g., all motion region areas)for every image (e.g., frame) in a set of images (e.g., frames). In somecases, feature extraction may not be performed for every image in theset of images for the entire static region and/or may not be performedfor all ROIs of the static region. For example, the feature manager 118may amortize feature extraction for the ROIs in the static region (e.g.,may spread ROI feature extraction over a number of frames). In someapproaches, the feature manager 118 may control feature extraction suchthat features are extracted from a number of ROIs (e.g., a subset of theROIs, one ROI, etc.) in the static region for each image (e.g., frame).For example, feature extraction for a set of ROIs in the static regionmay include extracting features from at most a subset of the set of ROIsfor each image. In some cases, feature extraction in the static region(e.g., static region ROIs) may be skipped for one or more images (e.g.,frames).

In some configurations, the feature manager 118 may additionally oralternatively control feature extraction based on processing load (e.g.,current processing load). For example, as the processing load for themotion region (e.g., motion region feature extraction, motion regionobject detection, etc.) increases, the feature manager 118 may reducefeature extraction in the static region. For instance, the featuremanager 118 may determine the size of the motion region, may determine anumber of areas (e.g., bounding boxes) or objects in the motion region,and/or may determine another processing load measure (e.g., a proportionof occupied processing capacity). In some configurations, as the size ofthe motion region increases, as the number of areas or objects in themotion region increases, and/or as another processing load measureincreases, the feature manager 118 may reduce the number of ROIs (perframe, for example) for which feature extraction is performed in thestatic region and/or may increase a number of skipped frames for whichfeature extraction is performed in the static region. Additionally oralternatively, as the size of the motion region decreases, as the numberof areas or objects in the motion region decreases, and/or as anotherprocessing load measure decreases, the feature manager 118 may increasethe number of ROIs (per frame, for example) for which feature extractionis performed in the static region and/or may decrease a number ofskipped frames for which feature extraction is performed in the staticregion.

In some approaches, the feature manager 118 may adaptively control thefeature extraction using one or more thresholds or functions. Forexample, when the size of the motion region becomes larger than athreshold, the feature manager 118 may reduce feature extraction for thestatic region. A series of thresholds may be utilized to progressivelyreduce and/or increase static region feature extraction based on thesize of the motion region, the number of objects or areas in the motionregion, and/or another processing load measure. Additionally oralternatively, a function may be utilized that maps an amount ofprocessing load (overall or for the motion region, for example) to anamount of static region feature extraction.

In some configurations, the feature manager 118 may control featureextraction and/or updating based on one or more time stamps. Forexample, each time features are extracted and/or updated for a region(e.g., ROI in the static region, area of the motion region, etc.), thefeature manager 118 may record a time stamp associated with the region.In some configurations, the feature manager 118 may prioritize featureextraction and/or updating for one or more regions with older timestamps. For example, when determining which ROI to update in the staticregion, the feature manager 118 may prioritize one or more ROIs witholder time stamps relative to one or more ROIs with more recent timestamps.

In some configurations, the feature manager 118 may control featureextraction and/or updating based on one or more specified “don't care”regions. For example, the electronic device 102 may receive an input(e.g., user input) specifying one or more “don't care” regions. In somecases, a “don't care” region may be an area of an image that is unlikelyto provide useful information. For example, a very distant part of ascene, a ceiling, or another area of an image may be unlikely to provideuseful information. The feature manager 118 may avoid performing featureextraction and/or updating in the one or more “don't care” regions.

The processor 112 may include and/or implement one or more featureextractors 120. The feature extractor(s) 120 may extract features fromone or more images (e.g., frames). For example, the feature extractor(s)120 may determine one or more features such as feature points,keypoints, feature vectors, corners, lines, etc., in one or more regions(e.g., motion region, static region, ROIs, bounding boxes, etc.) of oneor more images. In some approaches, determining the one or more featuresmay include searching a region for a structure or pattern (e.g., corner)to determine the features.

The feature extractor(s) 120 may be controlled by the feature manager118 as described above. For example, the feature manager 118 may controlwhen (e.g., for which image or frame) and/or where (e.g., in whichregion(s)) the feature extractor(s) 120 are employed to extract and/orupdate features.

In some configurations, the feature extractor(s) 120 may include amotion region feature extractor (e.g., a synchronous feature extractor)and a static region feature extractor (e.g., an asynchronous featureextractor). The feature manager 118 may control the motion regionfeature extractor and the static region feature extractor. The motionregion feature extractor and the static region feature extractor may beemployed at different rates. For example, the motion region featureextractor may extract, at a first rate, features from the motion region.The static region feature extractor may extract, at a second rate,features from the static region (e.g., one or more ROIs in the staticregion). In some approaches, the features from the motion region may notinclude features from the static region.

The processor 112 may include and/or implement a ROI determiner 122. TheROI determiner 122 may determine one or more ROIs in one or more images(e.g., frames). For example, the ROI determiner 122 may determine one ormore ROIs in the static region and/or in the motion region. Forinstance, the ROI determiner 122 may determine ROIs that enclosefeatures of one or more potential objects. In some configurations, theROI determiner 122 may output one or more ROIs regardless of whether theregion is static or moving. For example, the ROI determiner 122 may notrely on motion information to propose ROIs. For instance, the ROIdeterminer 122 may propose an ROI using one or more region proposalnetworks. In some implementations, a region proposal network may be afull convolutional network. In some approaches, the region proposalnetwork may utilize a sliding window (e.g., a set of windows) of thefeature map 136. Each window may be assigned to an intermediate feature,which may be provided to a regression layer and a classification layerof the network. The regression layer may produce a set of regionproposals and the classification layer may produce a set of valuesindicating respective object probabilities (e.g., probability that anobject is in a region) for each of the region proposals. It should benoted that in some implementations, the region proposal network mayshare one or more layers with one or more object detection networks(which may be utilized by the object detector(s) 130).

In some configurations, the processor 112 may include and/or implementone or more feature aligners 124. The feature aligner(s) 124 may alignfeatures in one or more regions (e.g., motion region, static region,ROIs, bounding box, etc.). In some configurations, the feature alignmentmay be implemented by an ROI pooling layer or an ROI align layer. Forexample, feature alignment may be a conjunction layer between a featuremap and a detector to align the feature(s) within an ROI. The featurealigner(s) 124 may take a feature enclosed by ROIs of different size oraspect ratios and produce the same dimension of feature output.Accordingly, the feature aligner(s) 124 may make a feature enclosed bythe ROI to be condensed and/or aligned into the same output dimension.The aligned feature may then be used for object detection (e.g.,classification and/or localization).

In some configurations, the feature aligner(s) 124 may include a motionregion feature aligner (e.g., a synchronous feature aligner) and astatic region feature aligner (e.g., an asynchronous feature aligner).The motion region feature aligner and the static region feature alignermay be employed at different rates. For example, the motion regionfeature aligner may align, at a first rate, features from the motionregion (e.g., from one or more areas or bounding boxes of the motionregion). The static region feature aligner may align, at a second rate,features from the static region (e.g., from one or more ROIs in thestatic region).

The processor 112 may include and/or implement one or more objectdetectors 130. The object detector(s) 130 may detect one or more objectsin one or more images. For example, the object detector(s) 130 maysearch one or more regions (e.g., motion region, static region, ROIs,bounding boxes, etc.) for one or more objects. In some approaches,object detection may be performed based on features. For example, theobject detector(s) 130 may utilize a neural network structure to providethe prediction of the object class and location. For instance, a neuralnetwork may be trained to classify proposed regions as object types. Asdescribed above, the neural network (e.g., convolutional neural network)for object detection (e.g., classification) may share one or more layerswith a network for region proposal.

The object detector(s) 130 may produce detection (e.g., identification,classification, etc.) results. In some approaches, the detection resultsmay include a location (e.g., bounding box, ROI, etc.) of a detectedobject. Additionally or alternatively, the detection results may includea classification or type (e.g., identification) of a detected object.For example, performing object detection may include classifying one ormore objects. For instance, each object template may have an associatedclassification or type. Examples of object classifications includevehicles (e.g., cars, trucks, buses, motorcycles, aircraft, bicycles,scooters, etc.), people, plants, trees, buildings, signs, roads, rocks,clothing, weapons, and other objects. In some approaches, theclassification or type of a detected object may be indicated with alabel (e.g., a word, character(s), symbol(s), etc.). The detectionresults 138 may be cached in a cache (in memory 126, for instance). Forexample, the detection results 138 may include a location (e.g.,bounding box, ROI, etc.) for one or more detected objects and/or mayinclude a classification or type (e.g., label) for one or more detectedobjects. Caching the detection results and/or feature map may bebeneficial. For example, having the detection results and/or feature mapcached may enable avoiding repeated computation.

In some configurations, detection results may be presented on thedisplay 132. For example, the processor 112 may present a location(e.g., bounding box, ROI, etc.) of the detected object and/or a label(e.g., word(s), character(s), symbol(s), etc.) on the display 132. Forexample, an image may be presented, where a bounding box is presentedaround the detected object and a label (e.g., “car,” “person,” etc.) ispresented on the display 132 (e.g., near the corresponding boundingbox).

In some configurations, the object detector(s) 130 may include a motionregion object detector (e.g., a synchronous object detector) and astatic region object detector (e.g., an asynchronous object detector).The motion region object detector and the static region object detectormay be employed at different rates. For example, the motion regionobject detector may detect, at a first rate, one or more objects fromthe motion region (e.g., from one or more areas or bounding boxes of themotion region). The static region object detector may detect, at asecond rate, one or more objects from the static region (e.g., from oneor more ROIs in the static region).

In some configurations, the processor 112 may perform one or moreoperations of a first thread at a first rate and may perform one or moreoperations of a second thread at a second rate. In some approaches, thefirst thread may include extracting features from the motion region anddetecting one or more objects based on the features. In some approaches,the second thread may include extracting features from the static region(e.g., from one or more ROIs in the static region), caching thefeatures, determining one or more ROIs in the static region, and/ordetecting one or more objects in the static region (e.g., in one or moreROIs) based on the features.

In some cases, an object in the static region may begin to move. Themotion detector 116 may detect the movement in the static region. Insome configurations, the processor 112 (e.g., ROI determiner 122,feature aligner(s) 124, and/or object detector(s) 130) may retrieveinformation (e.g., feature map 136 data and/or detection results 138data) from a cache (e.g., memory 126) corresponding to the movement. Insome approaches, the information may include cached featurescorresponding to an area where the movement is detected. The ROIdeterminer 122 may determine an ROI based on the cached features and/orthe object detector(s) 130 may detect an object based on the ROI and/orthe cached features. In some approaches, the processor 112 may presentthe detection results (e.g., bounding box and/or label) corresponding tothe object that has begun moving. By retrieving the cached features, theprocessor 112 may avoid having to extract features again for the objectthat has begun moving. In some approaches, the retrieved information mayinclude a label and/or an ROI. The processor 112 may directly presentthe ROI and/or the label.

In some configurations, the processor 112 may determine whether cacheddetection results are reliable before using the cached detection results(e.g., cached features, cached ROI, and/or cached label). For example,the processor 112 may determine whether motion (e.g., a degree ofmotion) was detected for a number of frames before the current frame. Iflittle (e.g., less than a threshold) or no motion was detected for thenumber of frames (e.g., one or more), the cached detection results(e.g., cached features, cached ROI, cached localization bounding box,and/or cached label) may be considered reliable. In some configurations,the cached features may be updated (e.g., consistently updated,regularly updated, etc.) by the feature manager 118.

In some approaches, the reliability determination may be utilized todetermine whether a cached ROI and/or label may be utilized, or whetheronly the cached features may be used (with ROI determination and/orobject detection performed again). For example, if a reliabilitycriterion is met (e.g., little or no movement before the current frame),then the ROI and/or label may be presented directly. If the reliabilitycriterion is not met, the cached features may be utilized to re-computean ROI and/or to perform object detection before presenting the updateddetection results. In some configurations, the cached features may beupdated (e.g., consistently updated, regularly updated, etc.) by thefeature manager 118.

It should be noted that one or more of the elements or components of theelectronic device 102 may be combined and/or divided. For example, oneor more of the image obtainer 114, the motion detector 116, the featuremanager 118, the feature extractor(s) 120, the ROI determiner 122, thefeature aligner(s) 124, and/or the object detector(s) 130 may becombined. Additionally or alternatively, one or more of the imageobtainer 114, the motion detector 116, the feature manager 118, thefeature extractor(s) 120, the ROI determiner 122, the feature aligner(s)124, and/or the object detector(s) 130 may be divided into elements orcomponents that perform a subset of the operations thereof.

FIG. 2 is a flow diagram illustrating one configuration of a method 200for object detection. The method 200 may be performed by the electronicdevice 102, for example. The electronic device 102 may receive 202 a setof images. This may be accomplished as described in relation to FIG. 1.For example, receiving 202 may include receiving the set of images froman image sensor included in the electronic device 102 or from a remotedevice (e.g., camera).

The electronic device 102 may determine 204 a motion region and a staticregion based on the set of images. This may be accomplished as describedin relation to FIG. 1. For example, the electronic device 102 maycompare images in the set of images (e.g., compare a previous image to acurrent image, etc.) to determine whether and/or what area(s) of theimage indicate motion. One or more areas that indicate a difference(e.g., a threshold difference) may be the motion region, whereas one ormore areas that do not indicate a difference (e.g., a thresholddifference) may be the static region.

The electronic device 102 may extract 206, at a first rate, firstfeatures from the motion region. This may be accomplished as describedin relation to FIG. 1. For example, the electronic device 102 mayextract features from the motion region (e.g., one or more areas of themotion region). The first rate may be expressed as a frame rate, afrequency, etc. In some approaches, the first rate may correspond to aninput frame rate (e.g., 60 frames per second (fps), 30 fps, etc.). Forexample, features may be extracted 206 from the motion region for everyreceived image or frame. In another example, features may be extracted206 at a sub-rate of the input frame rate (e.g., once every two frames,once every four frames, etc.).

The electronic device 102 may extract 208, at a second rate that isdifferent frame the first rate, second features from the static region.This may be accomplished as described in relation to FIG. 1. Forexample, the electronic device 102 may extract features from the staticregion (e.g., one or more areas of the static region, ROIs, etc.). Thesecond rate may be expressed as a frame rate, a frequency, etc. In someapproaches, the second rate may be lower than the first rate. Forexample, features may be extracted 208 from a subset of the staticregion for every received image or frame. Accordingly, several images orframes may be processed to extract 208 features from all of the ROIs inthe static region. In another example, features may be extracted 208 ata sub-rate of the input frame rate (e.g., once every two frames, onceevery four frames, etc.). It should be noted that in some approaches,the second rate may be higher than the first rate in some cases. Forexample, in a case that little or no motion is detected, featureextraction 208 from the motion region may occur at a lower rate thanfeature extraction 208 in the static region.

The electronic device 102 may cache 210 the second features. This may beaccomplished as described in connection with FIG. 1. For example, theelectronic device 102 may store and/or update the second features fromthe static region in a feature map in memory.

The electronic device 102 may detect 212 at least one object based on atleast a portion of the first features. This may be accomplished asdescribed in connection with FIG. 1. For example, the electronic device102 may perform object detection in the motion region based on theextracted first features. In some approaches, different portions of thefirst features may be utilized to detect objects in different areas ofthe motion region.

FIG. 3 is a block diagram illustrating an example of object detection inaccordance with some configurations of the systems and methods disclosedherein. In particular, FIG. 3 illustrates examples of a motion detector316, feature manager 318, feature extractors 320 a-b, a feature map 336,a ROI determiner 322, feature aligners 324 a-b, object detectors 330a-b, and a detection manager 342. One or more of the elements describedin connection with FIG. 3 may be examples of corresponding elementsdescribed in connection with FIG. 1. Additionally or alternatively, oneor more of the elements described in connection with FIG. 3 may beimplemented in the electronic device 102 described in connection withFIG. 1.

In the example illustrated in FIG. 3, one or more video streams 344 areprovided to the motion detector 316 and to the feature extractors 320a-b. The motion detector 316 may determine a motion map 346 or mask(e.g., a motion region and a static region). The motion map 346 may beprovided to the feature manager 318 and to the detection manager 342.

The feature manager 318 maintains a cached and shared feature map 336.For example, the cached feature map 336 may be utilized and/or updatedbased on motion and/or a time stamp. In some configurations, the featuremanager 318 may control each of the feature extractors 320 a-b toextract features and/or manage updating features. For example, a firstfeature extractor 320 a may extract features from the motion region(s)of the motion mask 346 and/or a second feature extractor 320 b mayextract features from a static region of the motion mask 346. Theextracted features may be stored in a feature map 336. Operation threadA (e.g., a synchronous thread) and operation thread B (e.g., anasynchronous thread) may both access the shared feature map 336. Forexample, operation thread B may asynchronously access cached features inthe feature map 336.

The ROI determiner 322 may determine ROIs 340 based on the feature map336. For example, the ROI determiner 322 may propose one or more ROIs340 that may enclose objects using region proposal networks. The ROIs340 may be provided to a second feature aligner 324 b (of operationthread B, for example). In some configurations, the ROIs 340 (e.g., ROIscorresponding to motion regions) may be provided to a first featurealigner 324 a (of operation thread A, for example).

The feature aligners 324 a-b may align features. In some configurations,feature alignment may be performed as described in connection withFIG. 1. For example, the first feature aligner 324 a may perform featurealignment corresponding to the motion region(s) (e.g., ROIs in themotion region(s)). The second feature aligner 324 b may perform featurealignment corresponding to the static region(s) (e.g., ROIs in thestatic region(s)).

The detectors 330 a-b may detect one or more objects. In someconfigurations, object detection may be performed as described inconnection with FIG. 1. For example, the first detector 330 a mayperform object detection corresponding to the motion region(s) (e.g.,ROIs in the motion region(s)). The second detector 330 b may performobject detection corresponding to the static region(s) (e.g., ROIs inthe static region(s)). Detection results from the second object detector330 b may be cached. The object detection results from the detectors 330a-b may be provided to the detection manager 342.

In some approaches, the feature manager 318 may update the features inaccordance with one or more rules. One rule may prioritize featureupdates for the motion region over feature updates for the staticregion. Another rule may prioritize one or more regions with older timestamps over one or more regions with newer time stamps. Yet another rulemay avoid feature updates for one or more user-specified “don't care”regions.

As illustrated in FIG. 3, operations may be organized into operationthread A and operation thread B. In some configurations, operationthread A may be considered a synchronous operation thread and operationthread B may be considered an asynchronous operation thread. Thefunctions included in operation thread A may be performed at a differentrate than the functions included in operation thread B.

In some configurations, a spatially and/or temporally asynchronousupdating scheme may be utilized for the feature update in operationthread B. The feature extractor 320 b in operation thread B may extractfeatures from the static region of the motion map 346 to update thefeature map 336. Instead of updating the feature map 336 of an entireframe at once, several disjoint regions (e.g., ROIs 340) may be updatedin a round-robin fashion across several frames. The computation load maybe amortized assuming the background remains static for a number of(e.g., two or more) frames.

In some configurations, both operation threads may operate on a sharedfeature map 336 in memory. One advantage to a shared feature map may beavoiding duplicated computation of features when ROIs 340 overlap. Thisproperty may be inherited from the RCNN approach. For instance, if afirst ROI is a sub-region of a second ROI at the current time stamp,only the feature located at the second ROI may need to be computed, asthe feature of the first ROI may be computed by (e.g., included in) thesecond ROI.

Some configurations of the systems and methods disclosed herein maysignificantly reduce the computation complexity of a RCNN objectdetector by utilizing different operation threads (e.g., synchronous andasynchronous threads). For example, the computation complexity of a RCNNwithout different operation threads may be expressed as O(HW)+O(1)+O(N),where H denotes image height, W denotes image width, and N denotes anumber of ROIs.

When the different operation threads are implemented, the computationcomplexity (average case) may be expressed as O(1)+O(1)+O(1). Forexample, operation thread A (e.g., a synchronous thread) complexity maybe expressed as O(N′ H′ W′)+O(1)+O(N′), where N′ is the number of ROIsin the motion region, H′ is the height of the ROIs in the motion region,and W′ is the width of the ROIs in the motion. Operation thread B (e.g.,an asynchronous thread) complexity may be expressed as O(HW)+O(1)+O(N).The RCNN operations may be amortized into constant time O(1) acrossseveral frames. The overall (average case) complexity may accordingly beexpressed as O(N′ H′ W′)+O(1)+O(N′)≈O(1)+O(1)+O(1). The assumptions ofthe average case are given as follows. First, the motion ROIs (N′) ismuch less than the number of ROIs (N) generated by a region proposalgenerator (e.g., ROI determiner), i.e., N′<<N. Second, the size of eachmotion ROI has a relatively small size of ROI with respect to the entireframe, i.e., H′<<H, and W′<<W. Third, most of the regions in the inputframe remain static. The motion region only occupies a relatively smallportion of the input frame. Hence, it can be expected that the overallcomplexity may achieve constant time as compared a RCNN implementationwithout different threads.

Utilizing the cached feature map and detection results may be beneficialby increasing efficiency (e.g., reducing computation complexity and timedelay). In some configurations, the cached feature map 336 may bemanipulated, interpolated, extrapolated, and/or filtered. Additionallyor alternatively, the context of the cached feature map and detectionresults may be utilized. For example, some approaches may utilize thetemporally and spatially correlated nature of the video stream, whichmay not be well-exploited in a single-frame-based RCNN.

The detection manager 342 may control the display of object detectionresults (to end users, for example) by fusing the object detectionresults from the operation threads (e.g., synchronous and asynchronousthreads). In some configurations, the detection manager 342 reduces thelatency of reporting a static-to-motion object (e.g., the bounding boxand class label of a static-to-moving object), since the cacheddetection results may be directly transferred to a display stage (e.g.,the display 332) when the object starts moving. Some configurations ofthe systems and methods disclosed herein are capable of detecting bothstatic and moving objects. In some approaches, moving objects may beserved with a higher priority to satisfy a time-sensitive quality ofservice. FIGS. 5-6 illustrate an example of reporting the detection(e.g., recognition) and localization of a static (non-moving) bus thatbegins to move based on the cached results from an earlier frame.

FIG. 4 is a diagram illustrating an example of object detection inaccordance with some configurations of the systems and methods disclosedherein. For instance, the electronic device 102 may operate inaccordance with the example of FIG. 4 in some configurations. In thisexample, an image 448 is received, where the image 448 includes sometrees, a car, a bus, and a person. The car and person are in motion. Thebus and the trees are static (e.g., not moving initially).

A motion mask 450 may be determined. The motion mask 450 may be anexample of the motion map described herein. The motion mask 450 may be abinary motion mask that indicates a motion region 454 and a staticregion 452. As can be observed, the motion region corresponds to the carand the person of the image 448.

Features may be extracted to produce a feature map 456 a. The featuremap 456 a includes features A 458 a corresponding to the motion region454 and features B 458 b corresponding to the static region 452. Anumber of ROIs 460 may be determined. In some configurations, areascorresponding to of the motion region 454 may also be ROIs. The ROIs 460are illustrated relative to the feature map 456 b.

In the example of FIG. 4, a labeled image 462 may be produced. In someapproaches, only moving objects may be indicated to a user (e.g.,annotated). For example, ROIs or bounding boxes corresponding to themotion region may be presented. Additionally or alternatively, labels464 corresponding to detected moving objects may also be presented. Inthe example of FIG. 4, the car and the person are indicated. Detectionresults for the trees and bus may be cached.

FIG. 5 is a diagram illustrating another example of object detection inaccordance with some configurations of the systems and methods disclosedherein. For instance, the electronic device 102 may operate inaccordance with the example of FIG. 5 in some configurations. In thisexample, an image 548 is received, where the image 548 includes sometrees, a car, a bus, and a person. The car, the person, and the bus arein motion. The image 548 may be received after the image 448 describedin connection with FIG. 4.

A motion mask 550 may be determined. The motion mask 550 may be a binarymotion mask that indicates a motion region 554 and a static region 552.As can be observed, the motion region corresponds to the car, theperson, and the bus of the image 548.

In this example, the feature map 556 may include the cached features A558 a and cached features B 558 b. ROIs 560 may also be cached inrelation to the feature map 556.

In the example of FIG. 5, a labeled image 562 may be produced. In someapproaches, only moving objects may be indicated to a user (e.g.,annotated). For example, ROIs or bounding boxes corresponding to themoving objects may be presented. Additionally or alternatively, labels564 corresponding to moving objects may also be presented. In theexample of FIG. 5, the car, the person, and the bus are indicated. Forexample, when the bus starts moving, the detection manager may transferthe bus label from cached results to a display for presentation (e.g.,reporting). This approach avoids the latency of updating the feature.Results for the trees may remain in the cache. In some approaches, whenthe bus starts moving, the cached recognition and localization resultsfor the bus may be directly presented to avoid the latency of performingRCNN detection on the bus's ROI.

FIG. 6 is a block diagram illustrating an example of elements that maybe implemented in accordance with some configurations of the systems andmethods disclosed herein. For example, a processor (e.g., processor 112described in connection with FIG. 1) may include and/or implement one ormore of the elements described in connection with FIG. 6. One or more ofthe elements described in connection with FIG. 6 may be examples ofcorresponding elements described in connection with one or more of FIGS.1 and 3.

In particular, FIG. 6 illustrates an image obtainer 614, a motiondetector 616, a feature manager 618, an ROI determiner 622, a detectionmanager 642, and a display interface 668. In this example, the imageobtainer 614 provides a low-resolution stream to the motion detector 616and a high-resolution stream to the feature manager 618. For instance, avideo stream at a first resolution may be provided to the motiondetector 616 and the video stream at a second resolution may be providedto the feature manager 618, where the first resolution is lower than thesecond resolution.

The motion detector 616 may detect motion in the low-resolution streamas described in connection with one or more of FIGS. 1-3. The motiondetector 616 may produce a motion map or mask, which may be provided tothe feature manager 618. In this example, feature extractor A 620 a(e.g., a synchronous feature extractor) and feature extractor B 620 b(e.g., an asynchronous feature extractor) may be included in the featuremanager 618. Feature extractor A 620 a may perform feature extraction asdescribed in connection with the feature extractor 320 a of operationthread A described in connection with FIG. 3. Feature extractor B 620 bmay perform feature extraction as described in connection with thefeature extractor 320 b of operation thread B described in connectionwith FIG. 3. The extracted features may be stored in a feature map(e.g., a shared feature map), which may be provided to the ROIdeterminer 622 and to the detection manager 642. The ROI determiner 622may determine one or more ROIs based on the feature map. The ROI(s) maybe provided to the detection manager 642 (e.g., object detector B 630b).

In the example shown in FIG. 6, object detector A 630 a (e.g., asynchronous object detector) and object detector B 630 b (e.g., anasynchronous object detector) may be included in the detection manager642. Object detector B 630 b may detect one or more objects based on theROI(s) and the feature map. Object detector A 630 a may detect one ormore objects based on the features from feature extractor A 620 a(and/or the motion region(s) provided by the motion detector 616).Detected object results may be provided to the display interface 668.

The display interface 668 may manage showing activities of videoanalysis results, such as object detection and/or tracking (e.g., humantracking), etc. For example, the display interface 668 may be aninterface to display the event information and/or metadata to users. Thedisplay interface 668 may include a metadata/event generator 666. Themetadata/event generator 666 may log detection and/or tracking results(e.g., label(s), bounding box location(s), width(s), height(s), classprediction probability(ies), and/or decision class(es)) in a database.In some configurations, the database may be stored on an edge device(e.g., wireless communication device) and/or may be sent back to anetwork (e.g., cloud) server. The display interface 668 may providemetadata and/or events (e.g., label(s), bounding box location(s),width(s), height(s), class prediction probability(ies), and/or decisionclass(es)) to a display for presentation.

In some configurations, one or more functions (e.g., feature extraction,ROI determination and/or object detection or classification) may beimplemented in one or more neural networks (e.g., a region-basedconvolutional neural network (RCNN)). For example, one or more layersmay be shared for ROI determination and classification. In the contextof FIG. 6, the RCNN may function in accordance with the motion region(s)and/or static region(s) as described herein. This may avoid repeatedlycomputing the entire feature map for each incoming video frame for anobject detection task, which may not be feasible to meet real-timeprocessing of a 4K video use case.

FIG. 7 is a flow diagram illustrating a more specific configuration of amethod 700 for object detection. The method 700 may be performed by theelectronic device 102, for example. The electronic device 102 mayreceive 702 a set of images. This may be accomplished as described inrelation to one or more of FIGS. 1-2.

The electronic device 102 may determine 704 a motion region and a staticregion based on the set of images. This may be accomplished as describedin relation to one or more of FIGS. 1-2.

The electronic device 102 may extract 706, at a first rate, firstfeatures from the motion region. This may be accomplished as describedin relation to one or more of FIGS. 1-2.

The electronic device 102 may extract 708, at a second rate that isdifferent frame the first rate, second features from the static region.This may be accomplished as described in relation to one or more ofFIGS. 1-2.

The electronic device 102 may cache 710 the second features. This may beaccomplished as described in connection with FIG. 1.

The electronic device 102 may detect 712, at the first rate, at leastone object based on at least a portion of the first features. This maybe accomplished as described in connection with one or more of FIGS.1-2.

The electronic device 102 may determine 714, at the second rate, atleast one ROI in the static region. This may be accomplished asdescribed in connection with FIG. 1.

The electronic device 102 may detect 716, at the second rate, at leastone object based on at least a portion of the second features in the atleast one ROI. This may be accomplished as described in connection withFIG. 1. For example, the electronic device 102 may perform detection ona subset of ROIs in the static region over the set of images.

The electronic device 102 may cache 718 detection results. This may beaccomplished as described in connection with FIG. 1. For example, theelectronic device 102 may store ROIs, bounding boxes, and/or labelscorresponding to detected objects in memory.

The electronic device 102 may determine 720 whether movement is detectedin the static region. This may be accomplished as described inconnection with FIG. 1. For example, the electronic device 102 maydetermine whether movement is occurring in all or part of the (previous)static region. For example, the electronic device 102 may compare aprevious motion map to a current motion map and determine whether motionis detected in the static region of the previous motion map. In a casethat movement is not detected in the static region, operation maycontinue to update feature extraction in the motion region and/or staticregion, and/or to perform object detection in the motion region and/orstatic region.

In a case that movement is detected in the static region, the electronicdevice 102 may retrieve 722 information from the cache corresponding tothe movement. This may be accomplished as described in connection withFIG. 1. For example, the electronic device 102 may retrieve cachedfeatures, ROIs, bounding boxes, and/or labels from memory. Retrieving722 the information from the cache may be performed in response todetecting the movement in the static region.

FIG. 8 is a flow diagram illustrating a configuration of a method 800for presenting detection results. The method 800 may be performed by anelectronic device (e.g., the electronic device 102 described inconnection with FIG. 1). In particular, the method 800 illustrates oneexample of a case in which movement is detected in a static region.

The electronic device 102 may detect 802 movement in the static region.This may be accomplished as described in connection with FIG. 1 or FIG.7.

The electronic device 102 may determine 804 whether cached detectionresults are reliable. This may be accomplished as described inconnection with FIG. 1. For example, the electronic device 102 maydetermine whether more than a threshold amount of movement occurred inone or more previous frames. If more than the threshold amount ofmovement has occurred, the cached detection results may be consideredunreliable. Otherwise, the cached detection results may be consideredreliable.

In a case that the cached detection results are determined to bereliable, the electronic device 102 may retrieve 806 cached detectionresults corresponding to the movement. This may be accomplished asdescribed in connection with FIG. 1. For example, the electronic device102 may retrieve an ROI and/or label corresponding to the area ofmovement.

The electronic device 102 may present 808 the cached detection results.For example, the electronic device 102 may present the ROI and/or labelon a display in association with the detected object in an image.

In a case that the cached detection results are determined to beunreliable, the electronic device 102 may retrieve 810 cached features.This may be accomplished as described in connection with FIG. 1. Forexample, the electronic device 102 may retrieve features in the cachedfeature map corresponding to an area of the movement.

The electronic device 102 may determine 812 an ROI based on the cachedfeatures. This may be accomplished as described in connection with FIG.1.

The electronic device 102 may detect 814 an object based on the cachedfeatures and the ROI. This may be accomplished as described inconnection with FIG. 1. For example, the electronic device 102 mayperform object detection (e.g., identification) within the ROI based onthe cached features. Detecting 814 an object may produce detectionresults (e.g., ROI and/or label).

The electronic device 102 may present 816 the detection results. Thismay be accomplished as described in connection with FIG. 1. For example,the electronic device 102 may present the ROI and/or label on a displayin association with the detected object in an image.

FIG. 9 illustrates certain components that may be included within anelectronic device 902. The electronic device 902 may be an example ofand/or may be implemented in accordance with the electronic device 102described in connection with FIG. 1. The electronic device 902 may be(or may be included within) a camera, video camcorder, digital camera,cellular phone, smart phone, computer (e.g., desktop computer, laptopcomputer, etc.), tablet device, media player, television, vehicle,automobile, personal camera, action camera, surveillance camera, mountedcamera, connected camera, robot, aircraft, drone, unmanned aerialvehicle (UAV), healthcare equipment, gaming console, personal digitalassistants (PDA), set-top box, etc. The electronic device 902 includes aprocessor 982. The processor 982 may be a general purpose single- ormulti-chip microprocessor (e.g., an advanced RISC machine (ARM)), aspecial purpose microprocessor (e.g., a digital signal processor (DSP)),a microcontroller, a programmable gate array, etc. The processor 982 maybe referred to as a central processing unit (CPU). Although just asingle processor 982 is shown in the electronic device 902, in analternative configuration, a combination of processors (e.g., an ARM andDSP) could be used.

The electronic device 902 also includes memory 984. The memory 984 maybe any electronic component capable of storing electronic information.The memory 984 may be embodied as random access memory (RAM), read-onlymemory (ROM), magnetic disk storage media, optical storage media, flashmemory devices in RAM, on-board memory included with the processor,erasable programmable read-only memory (EPROM), electrically erasableprogrammable read-only memory (EEPROM), registers, and so forth,including combinations thereof.

Data 988 a and instructions 986 a may be stored in the memory 984. Theinstructions 986 a may be executable by the processor 982 to implementone or more of the methods 200, 700, 800 described herein. Executing theinstructions 986 a may involve the use of the data 988 a that is storedin the memory 984. When the processor 982 executes the instructions 986,various portions of the instructions 986 b may be loaded onto theprocessor 982, and various pieces of data 988 b may be loaded onto theprocessor 982.

The electronic device 902 may also include a transmitter 970 and areceiver 972 to allow transmission and reception of signals to and fromthe electronic device 902. The transmitter 970 and receiver 972 may becollectively referred to as a transceiver 976. One or multiple antennas974 a-b may be electrically coupled to the transceiver 976. Theelectronic device 902 may also include (not shown) multipletransmitters, multiple receivers, multiple transceivers, and/oradditional antennas.

The electronic device 902 may include a digital signal processor (DSP)978. The electronic device 902 may also include a communicationinterface 980. The communication interface 980 may enable one or morekinds of input and/or output. For example, the communication interface980 may include one or more ports and/or communication devices forlinking other devices to the electronic device 902. Additionally oralternatively, the communication interface 980 may include one or moreother interfaces (e.g., touchscreen, keypad, keyboard, microphone,camera, etc.). For example, the communication interface 980 may enable auser to interact with the electronic device 902.

The various components of the electronic device 902 may be coupledtogether by one or more buses, which may include a power bus, a controlsignal bus, a status signal bus, a data bus, etc. For the sake ofclarity, the various buses are illustrated in FIG. 9 as a bus system990.

The term “determining” encompasses a wide variety of actions and,therefore, “determining” can include calculating, computing, processing,deriving, investigating, looking up (e.g., looking up in a table, adatabase or another data structure), ascertaining and the like. Also,“determining” can include receiving (e.g., receiving information),accessing (e.g., accessing data in a memory) and the like. Also,“determining” can include resolving, selecting, choosing, establishing,and the like.

The phrase “based on” does not mean “based only on,” unless expresslyspecified otherwise. In other words, the phrase “based on” describesboth “based only on” and “based at least on.”

The term “processor” should be interpreted broadly to encompass ageneral purpose processor, a central processing unit (CPU), amicroprocessor, a digital signal processor (DSP), a controller, amicrocontroller, a state machine, and so forth. Under somecircumstances, a “processor” may refer to an application specificintegrated circuit (ASIC), a programmable logic device (PLD), a fieldprogrammable gate array (FPGA), etc. The term “processor” may refer to acombination of processing devices, e.g., a combination of a DSP and amicroprocessor, a plurality of microprocessors, one or moremicroprocessors in conjunction with a DSP core, or any other suchconfiguration.

The term “memory” should be interpreted broadly to encompass anyelectronic component capable of storing electronic information. The termmemory may refer to various types of processor-readable media such asrandom access memory (RAM), read-only memory (ROM), non-volatile randomaccess memory (NVRAM), programmable read-only memory (PROM), erasableprogrammable read-only memory (EPROM), electrically erasable PROM(EEPROM), flash memory, magnetic or optical data storage, registers,etc. Memory is said to be in electronic communication with a processorif the processor can read information from and/or write information tothe memory. Memory that is integral to a processor is in electroniccommunication with the processor.

The terms “instructions” and “code” should be interpreted broadly toinclude any type of computer-readable statement(s). For example, theterms “instructions” and “code” may refer to one or more programs,routines, sub-routines, functions, procedures, etc. “Instructions” and“code” may comprise a single computer-readable statement or manycomputer-readable statements.

The functions described herein may be implemented in software orfirmware being executed by hardware. The functions may be stored as oneor more instructions on a computer-readable medium. The terms“computer-readable medium” or “computer-program product” refers to anytangible storage medium that can be accessed by a computer or aprocessor. By way of example, and not limitation, a computer-readablemedium may comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage or other magnetic storage devices, or anyother medium that can be used to carry or store desired program code inthe form of instructions or data structures and that can be accessed bya computer. Disk and disc, as used herein, includes compact disc (CD),laser disc, optical disc, digital versatile disc (DVD), floppy disk, andBlu-ray® disc where disks usually reproduce data magnetically, whilediscs reproduce data optically with lasers. It should be noted that acomputer-readable medium may be tangible and non-transitory. The term“computer-program product” refers to a computing device or processor incombination with code or instructions (e.g., a “program”) that may beexecuted, processed, or computed by the computing device or processor.As used herein, the term “code” may refer to software, instructions,code, or data that is/are executable by a computing device or processor.

Software or instructions may also be transmitted over a transmissionmedium. For example, if the software is transmitted from a website,server, or other remote source using a coaxial cable, fiber optic cable,twisted pair, digital subscriber line (DSL), or wireless technologiessuch as infrared, radio and microwave, then the coaxial cable, fiberoptic cable, twisted pair, DSL, or wireless technologies such asinfrared, radio and microwave are included in the definition oftransmission medium.

The methods disclosed herein comprise one or more steps or actions forachieving the described method. The method steps and/or actions may beinterchanged with one another without departing from the scope of theclaims. In other words, unless a specific order of steps or actions isrequired for proper operation of the method that is being described, theorder and/or use of specific steps and/or actions may be modifiedwithout departing from the scope of the claims.

Further, it should be appreciated that modules and/or other appropriatemeans for performing the methods and techniques described herein, can bedownloaded, and/or otherwise obtained by a device. For example, a devicemay be coupled to a server to facilitate the transfer of means forperforming the methods described herein. Alternatively, various methodsdescribed herein can be provided via a storage means (e.g., randomaccess memory (RAM), read-only memory (ROM), a physical storage mediumsuch as a compact disc (CD) or floppy disk, etc.), such that a devicemay obtain the various methods upon coupling or providing the storagemeans to the device.

It is to be understood that the claims are not limited to the preciseconfiguration and components illustrated above. Various modifications,changes, and variations may be made in the arrangement, operation, anddetails of the systems, methods, and apparatus described herein withoutdeparting from the scope of the claims.

What is claimed is:
 1. A method performed by an electronic device,comprising: receiving a set of images; determining a motion region and astatic region based on the set of images; extracting, at a first rate,first features from the motion region; extracting, at a second rate thatis different from the first rate, second features from the staticregion; caching the second features; and detecting at least one objectbased on at least a portion of the first features.
 2. The method ofclaim 1, further comprising: detecting movement in the static region;and retrieving information from a cache in response to the detectedmovement.
 3. The method of claim 2, wherein the information comprisescached features, and wherein the method further comprises: determining aregion of interest (ROI) based on the cached features; and identifyingan object based on the ROI and the cached features.
 4. The method ofclaim 2, wherein the information comprises a label, and wherein themethod further comprises presenting the label.
 5. The method of claim 1,wherein extracting the first features and extracting the second featuresis performed on a union of the motion region and one or more regions ofinterest (ROIs) in the static region.
 6. The method of claim 1, whereinthe second rate is lower than the first rate.
 7. The method of claim 1,wherein a first operation thread that operates at the first ratecomprises extracting the first features and detecting the at least oneobject based on the at least a portion of the first features, and asecond operation thread that operates at the second rate comprisesextracting the second features and caching the second features.
 8. Themethod of claim 7, wherein the second operation thread furthercomprises: determining at least one region of interest (ROI) in thestatic region; and detecting at least one object based on at least aportion of the second features in the at least one ROI.
 9. The method ofclaim 8, wherein the at least one ROI comprises a set of ROIs in thestatic region, and wherein extracting the second features comprisesextracting features from at most a subset of the set of ROIs for eachimage of the set of images.
 10. The method of claim 1, wherein the firstfeatures and the second features are cached in a shared feature map. 11.The method of claim 1, further comprising: classifying the at least oneobject to produce at least one label; and presenting the at least onelabel.
 12. An electronic device, comprising: a processor configured to:receive a set of images; determine a motion region and a static regionbased on the set of images; extract, at a first rate, first featuresfrom the motion region; extract, at a second rate that is different fromthe first rate, second features from the static region; cache the secondfeatures; and detect at least one object based on at least a portion ofthe first features.
 13. The electronic device of claim 12, wherein theprocessor is configured to: detect movement in the static region; andretrieve information from a cache in response to the detected movement.14. The electronic device of claim 13, wherein the information comprisescached features, and wherein the processor is configured to: determine aregion of interest (ROI) based on the cached features; and identify anobject based on the ROI and the cached features.
 15. The electronicdevice of claim 13, wherein the information comprises a label, andwherein the processor is configured to present the label.
 16. Theelectronic device of claim 12, wherein extracting the first features andextracting the second features is performed on a union of the motionregion and one or more regions of interest (ROIs) in the static region.17. The electronic device of claim 12, wherein the second rate is lowerthan the first rate.
 18. The electronic device of claim 12, wherein afirst operation thread that operates at the first rate comprisesextracting the first features and detecting the at least one objectbased on the at least a portion of the first features, and a secondoperation thread that operates at the second rate comprises extractingthe second features and caching the second features.
 19. The electronicdevice of claim 18, wherein the second operation thread furthercomprises: determining at least one region of interest (ROI) in thestatic region; and detecting at least one object based on at least aportion of the second features in the at least one ROI.
 20. Theelectronic device of claim 19, wherein the at least one ROI comprises aset of ROIs in the static region, and wherein extracting the secondfeatures comprises extracting features from at most a subset of the setof ROIs for each image of the set of images.
 21. The electronic deviceof claim 12, wherein the first features and the second features arecached in a shared feature map.
 22. The electronic device of claim 12,wherein the processor is configured to: classify the at least one objectto produce at least one label; and present the at least one label. 23.An apparatus, comprising: means for receiving a set of images; means fordetermining a motion region and a static region based on the set ofimages; means for extracting, at a first rate, first features from themotion region; means for extracting, at a second rate that is differentfrom the first rate, second features from the static region; means forcaching the second features; and means for detecting at least one objectbased on at least a portion of the first features.
 24. The apparatus ofclaim 23, further comprising: means for detecting movement in the staticregion; and means for retrieving information from a cache in response tothe detected movement.
 25. The apparatus of claim 23, wherein the meansfor extracting the first features and the means for extracting thesecond features are based on a union of the motion region and one ormore regions of interest (ROIs) in the static region.
 26. The apparatusof claim 23, wherein the second rate is lower than the first rate.
 27. Anon-transitory tangible computer-readable medium storing computerexecutable code, comprising: code for causing an electronic device toreceive a set of images; code for causing the electronic device todetermine a motion region and a static region based on the set ofimages; code for causing the electronic device to extract, at a firstrate, first features from the motion region; code for causing theelectronic device to extract, at a second rate that is different fromthe first rate, second features from the static region; code for causingthe electronic device to cache the second features; and code for causingthe electronic device to detect at least one object based on at least aportion of the first features.
 28. The computer-readable medium of claim27, further comprising: code for causing the electronic device to detectmovement in the static region; and code for causing the electronicdevice to retrieve information from a cache in response to the detectedmovement.
 29. The computer-readable medium of claim 27, wherein the codefor causing the electronic device to extract the first features and thecode for causing the electronic device to extract the second featuresare based on a union of the motion region and one or more regions ofinterest (ROIs) in the static region.
 30. The computer-readable mediumof claim 27, wherein the second rate is lower than the first rate.